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Abstract 

^-p"! ' We propose a non-parametric statistical procedure for detecting multiple change-points in 

multidimensional signals. The method is based on a test statistic that generalizes the well- 
known Kruskal-Wallis procedure to the multivariate setting. The proposed approach does not 
require any knowledge about the distribution of the observations and is parameter-free. It is 
computationally efficient thanks to the use of dynamic programming and can also be applied 
^ ^ \ when the number of change-points is unknown. The method is shown through simulations 

to be more robust than alternatives, particularly when faced with atypical distributions (e.g., 
^SJ ' with outliers), high noise levels and/ or high-dimensional data. 
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(N ! 1 Introduction 

O 

Retrospective multiple change-point estimation consists in partitioning a non-stationary series 
of observations into several contiguous stationary segments of variable durations HI. It is par- 
ticularly appropriate for analyzing a posteriori time series in which the quantity driving the 
' behavior of the time series jumps from one level to another at random instants called change- 

$H . points. Such a task, also known as temporal signal segmentation in signal processing, arises 

in many applications, ranging from EEG to speech processing and network intrusion detection 

ma. 

The classic setting for change-point detection and estimation is the case where the sequence 
of observations is univariate and the use of the least-squares criterion is considered to be ade- 
quate, for instance, in the presence of Gaussian noise f?. '5^. When there are multiple change- 
points to be detected, the composite least-squares criterion can be efficiently optimized using a 
dynamic programming recursion whose computational cost scales like the square of the number 
of observations 16(|7(|8|. The approach can be generalized to multivariate settings [9]. 

In this work, we consider methods that can be applied to high-dimensional data without 
requiring strong prior knowledge of the underlying distribution of the observations. To the 
best of our knowledge, there are very few approaches in the literature that are suitable for this 
task. The first option is the kernel-based approach of LIOJ which uses a general kernel metric to 
evaluate distances between multivariate observations. This approach is powerful but requires 
the choice of an appropriate kernel function and, furthermore, it is not robust in situations 
where the noise level is very significant as will be illustrated in Section H) The other relevant 
reference is [11] which combines imivariate rank statistics using a weighting matrix derived from 
an asymptotic analysis. This approach, which is based on the principle of the Mann-Whitney 
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Wilcoxon two-sample test, is however limited to the case where, at most, a single change-point is 
present in the observed signal. Our contribution with this paper is to show that the test statistic 
of [llj can be modified so as to deal with multiple change-points (in analogy with the way the 
classical Kruskal-Wallis test generalizes the Mann- Whitney Wilcoxon statistic to multiple group 
testing [12|. Furthermore, we show that the proposed test statistic is amenable to dynamic 
programming and can thus be optimized efficiently despite the combinatorial nature of the 
multiple change-point detection task. We will show in particular that the proposed method, 
termed dynMKW (for "dynamic programming multivariate Kruskal-Wallis"), is more efficient 
than greedy strategies based on repeated use of single change-point tests as suggested in [13). 

When dealing with multiple change-points, one needs a criterion for estimating the number 
of change-points which is an instance of a model selection problem. Traditional approaches 
include the use of a complexity penalty that is added to the test statistic |5l [71 [Till or the use 
of priors on the locations of change-points when adopting the Bayesian point of view, see, 
e.g., [[I5l [HI and references therein. An original approach adopted in HZl [TS) is to consider 
the penalty associated to the use of the £i norm (or block-^i norm in [TF|). In this paper, we 
propose a simple construction which relies on the so-called "slope heuristic" and is a variant of 
the procedure described in fT4| . Although heuristic this approach is preferable to off-the-shelf 
Schwarz-like penalties (AlC, BIC, etc.) which are most often found to be badly-calibrated for 
the type of problems considered here. 

The paper is organized as follows. The statistical methodology for testing homogeneity 
in several groups of multivariate observations is described in Section [2] The procedure for 
estimating the change-points derived from this testing procedure is then described in Section [3| 
In Sections [H we report the results of numerical experiments carried out on synthetic and real 
data. 



2 Testing homogeneity within several groups of multivariate 
data 

Let Xi, ...,X„ be n L-dimensional observed random vectors such that Xy = {Xji, . . . ,Xji)' , 
where A' denotes the transpose of the matrix A. We assume in the following that Xi, . . . ,X„ are 
independent. In this section, we first consider testing the hypothesis (Hq) that K given groups, 
Xj, . . . ,X„j, X„j+i, . . . ,X„2, . . . , X„j^ j+i, . . . ,X„j^, have the same distribution, where we will use 
the convention that riQ = 1 and nf^ = n. 

For / in {!,...,«} and ^ in {1, ... , L}, let denote the rank of Xj^i among (X^ £, . . . , X„^g) 

that is R'-p = ELi '^{Xte<Xje}- Also define r[^\ for fc in {0, . . . , X - 1}, by 



We propose to use the following statistic: 

K-l 



T(«i, . . . , uk-i) = (wfc+i - nk)R'k i'n^ Rk , (1) 

where Ri^ is defined by 

and £„ is the L x L matrix whose (£, ^')-th element is defined by 



R,= (RW-„/2,...,R[^'-n/2)', (2) 



^nM' = i E{k!'^ - n/2}{RP - n/2} . (3) 
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Note that £„^£^' can be rewritten as 



Kw = - E{fn/(X/,,) - l/2}{ V(X,-,,,) - 1/2} , (4) 

where f„^({t) = n^^ Y^=i l{Xyf<f} denotes the empirical cumulative distribution function (c.d.f.) 

of the ^th coordinate f . The matrix £„ thus corresponds to an empirical estimate of the 
covariance matrix S with general term 

= Cov (Ff(Xi,f);Ff,(Xi/')) , 1 < t^' < ^ , (5) 

Fi denoting the c.d.f. of ^ which we shall assume in the sequel to be continuous. 

Observe that ^ extends to the multivariate setting the classical Kruskal-Wallis test fT2l 
which applies to the univariate data. Indeed, when L = 1, is equivalent to 

r(ni,...,«K-i) = ^ I;K+i-«(c) , (6) 

" /c=0 

where we have replaced £,,41 by En = Var(Fi(Xij)) = Var(JJ) = 1/12, where !i is a uniform 
random variable on [0,1], Fi being a continuous cumulative distribution function. In the case 
where there is only one change-point, i.e. when X = 2, ((T) reduces to the test statistic proposed 
in nn to extend the Mann-Whitney/Wilcoxon rank test to multivariate data. We state below 
(without proof for reasons of space) a result which shows that ([ij is properly normalized in the 
sense that under the null hypothesis it converges, as n increases, to a fixed limiting distribution 
that only depends on K and L. 

Theorem 1. Assume that (X;)i<;<„ are H^-valued i.i.d. random vectors such that, for all I, the c.d.f. 
F£ ofXi£ is a continuous function. Assume also that for k = 0, . . . ,K — 1, there exists in (0, 1) such 
that {nj^+i — n^) In A/^+i, as n tends to infinity. Then, T{ni, . . . , wk-i) defined in Q satisfies 

T{m,...,nK-i) ^X^{{K-1)L) , flsn^oo, (7) 

where d denotes the convergence in distribution and ;\;^((-K — 1)L) denotes the chi-square distribution 
with {K — 1)L degrees of freedom. 



3 Detecting change-point locations 

In this section, we consider applying the test statistic described in the previous section to deter- 
mine the positions of the segment boundaries «i, . . . , n^_i, starting with the simpler case where 
the number of change-points K is known. 

3.1 Estimation of a known number of change-points 

To estimate the change-point locations, we propose to maximize the statistic T{ni, . . . ,nj^^i) 
defined in ((ij with respect to the segment boundaries n^, . . . , nj^^y. 

(%,..., %_l ) = argmax T(ni, . . . , n^.i) . (8) 

l<ni<---<«K-l<" 

Of course, direct maximization of ^ is impossible as it corresponds to a combinatorial 
problem whose complexity is exponential in X. We can however perform this maximization 
efficiently using the dynamic programming algorithm described in [6]. More precisely, using 
the notations 
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and 

K-l 

hip) = ^ max A{n!, + 1 : n^+i) , 

1<«1<---<«K-1<"K=P jt=0 

we have 

Jk(p) =max{/K-i(Mic-i)+A(njc-i + l : p)} . (9) 

"K-l 

Thus, for solving the optimization problem ||8]l, we proceed as follows. We start by computing 
the A{i : j) for all (/,;') such that 1 < i < j < n. All the Ji (E) are thus available for E = 2, . . . , n. 
Then /2(E) is computed by using the recursion and so on. The overall numerical complexity 
of the procedure is thus proportional to X x n^. 



3.2 Estimation of the number of change-points 

In practice the number of change-points can rarely be assumed to be known. Although it is 
not the main focus of this paper, we propose a heuristic algorithm to find the optimal number 
of change-points. Values of the statistics /^c(n), for K = 0, . . .,Xmax/ are first computed using 
the procedure described in the previous section. The algorithm is based on the principle that 
in the presence of X* > 1 change-points, if is plotted against K, the resulting graph 

can be decomposed into two distinct linear trends: a first one, for K = 0, . . .,K* where the 
criterion is growing rapidly; and a second one, for K = K*, . . ., Xmax, where the criterion is 
barely increasing. Hence, for each possible value of X in K = 1, . . .,Xinax/ we compute least 
square linear regressions for both parts of the graph (before and after K); the estimated number 
of change-points is the value of K that yields the best fit, that is, the value for which the sum 
of the residual sums of squares computed on both parts of the graph is minimal (see Figure 
[1). The case X = is treated separately and the procedure that has just been described is 
used only when the value of T{ni) is significant, based on the asymptotic p-value for the single 
change-point case obtained in [llj. 
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4 Numerical experiments 



In this section, the performance of the dynMKW algorithm is assessed using simulated signals 
and we next consider its application to data arising from a biological context. 

4.1 Simulated data 

The simulation reported is this section is based on a fixed 5-dimensional piecewise constant 
signal of length 500 with a predefined set of four shared breakpoints. Note that this signal 
reproduces an important feature of many applications (see (S) as well as the example of the 
cancer data below) in that only a subset of its coordinates actually change simultaneously. To 
this baseline signal is added some moderately correlated Gaussian noise with marginal variance 
tj^, see Figure |2] for an example corresponding to a marginal SNR of 16 dB, where the SNR is 
defined as the ratio of the jump amplitude over cr. The performance of the algorithms is assessed 
from 1000 Monte-Carlo replications of the data for each value of cr^ and measured in terms of 
precision (proportion of correctly estimated changes found among the detected change-points) 
and recall (proportion of actual change-points that have been correctly estimated). We used a 
±1 sample tolerance for deciding whether a change-point has been correctly estimated but the 
conclusion were qualitatively the same for larger values of this tolerance. 
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Figure 2: Baseline signal with superimposed noise at an SNR of 16 dB. 



4.1.1 Known number of change-points 

We compare our algorithm with two other different methods - which also rely on a dynamic 
programming step - for multiple change-points detection: first, with the kernel-based approach 
of flOl that computes "intra-segment scatter" using an isotropic Gaussian kernel fimction and 
second, the parametric method corresponding to the multivariate Gaussian assumption. 

As shown in Figure |3] left, the parametric method ("Linear") yields the best results, as the 
noise added to the signal is indeed Gaussian. Unsurprisingly, the proposed dynMKW approach 
performs slightly worse in this scenario but is certainly preferable than the kernel test, par- 
ticularly when the level of noise increases. In Figure |3] right, we repeat the experiment with 
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significant contamination by outliers. The dynMKW method demonstrates its robustness as, 
contrary to both alternatives, its performance barely suffer from the presence of outliers. 




-4 3 6 9 13 16 -4 3 6 9 13 16 
Marginal SNR (dB) Marginal SNR (dB) 



Figure 3: Precision curves of the dynMKW, kernel-based and linear algorithms at different levels of noise for a known 
number of change-points. Left: signals with no outliers; right: signals with 5% of outliers (whose variance is 10 dB 
higher than that of the background noise). Recall curves are not displayed as they are identical in the known number 
of change-points case. 



4.1.2 Unknown number of change-points 

Using the same signal as previously, we suppose in the experiments presented here that the 
number of change-points is unknown. Our algorithm, combined with the approach for estimat- 
ing the number of change-points presented in Section l3l2l is compared with a greedy hierarchical 
method suggested by |13|: Breakpoints are detected by iteratively testing for the presence of a 
single change-point in the sub-segments determined in previous stages. This method {"Vast") is 
applied using lO restricted to X = 2 groups as a single change-point detector, where the asymp- 
totic p-value determined in ITTI is computed to decide whether the current segment should be 
segmented any further (segmentation stops whenever the p-value is larger than a threshold, say 
1 or 5%). 

Results are shown in Figure S] with precision and recall measures. It is observed that dyn- 
MKW outperforms the Vast procedure both in precision and recall. From a more qualitative per- 
spective, the Vast procedure tends to systematically over-segment the signal while our method, 
when faulting, is more inclined to miss some change-points. 

4.2 Bladder tumor CGH profiles 

We consider the bladder cancer micro-array aCGH dataseill of [TSl- The objective here is to 
jointly segment data recorded from different subjects so as to robustly detect regions of frequent 
deletions or additions of DNA which could be characteristic of cancer. Each of the 57 profiles 
provides the relative quantity of DNA for 2215 probes measured on 23 chromosomes. We ran 
the dynMKW algorithm on each chromosome separately, thus treating 23 different 9- to 57- 
dimensional signals (depending on the selected groups of patients at different stages of cancer) 
of length 50 to 200 (the number of probes varies for each chromosome). Results are shown for a 
group of 9 individuals corresponding to Stage Tl of a tumor. In Figure |5l the smoothed version 
of the whole set of probes resulting from the segmentation is displayed, while a focus is given 

: //cbio . ensmp . f r/-f rapaport/CGHf usedSVM/index .html| 
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Figure 4: Precision (top) and recall (bottom) for dynMKW with estimates of the number of change-points and Multi- 
Rank with binary segmentation (Vost.) with threshold of 0.01 and 0.05. 

on the 10th chromosome in Figure |6l In each case, the segmentation result is represented by a 
signal which is constant (and equal to the mean of the data) within the detected segments. 




Figure 5: Superposition of the smoothed bladder tumor aCGH data for 9 individuals in Stage Tl cancer that results 
from the segmentation. Vertical dashed lines represent the separation between the different chromosomes. 
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Figure 6: Data for 9 individuals in Stage Tl bladder cancer with superimposed segmentation for chromosome 10. 
The dashed vertical lines correspond to the estimated segment boundaries. 
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